Distributed Data Strategies to Support Large-Scale Data Analysis Across Geo-Distributed Data Centers
نویسندگان
چکیده
منابع مشابه
Entropy-based Consensus for Distributed Data Clustering
The increasingly larger scale of available data and the more restrictive concerns on their privacy are some of the challenging aspects of data mining today. In this paper, Entropy-based Consensus on Cluster Centers (EC3) is introduced for clustering in distributed systems with a consideration for confidentiality of data; i.e. it is the negotiations among local cluster centers that are used in t...
متن کاملData Scheduling for Large Scale Distributed Applications
Current large scale distributed applications studied by large research communities result in new challenging problems in widely distributed environments. Especially, scientific experiments using geographically separated and heterogeneous resources necessitated transparently accessing distributed data and analyzing huge collection of information. We focus on data-intensive distributed computing ...
متن کاملVisualization of Large-Scale Distributed Data
INTRODUCTION The primary goal of visualization is insight. An effective visualization is best achieved through the creation of a proper representation of data and the interactive manipulation and querying of the visualization. Large-scale data visualization is particularly challenging because the size of the data is several orders of magnitude larger than what can be managed on an average deskt...
متن کاملModels for Distributed, Large Scale Data Cleaning
Poor data quality is a serious and costly problem affecting organizations across all industries. Real data is often dirty, containing missing, erroneous, incomplete, and duplicate values. Declarative data cleaning techniques have been proposed to resolve some of these underlying errors by identifying the inconsistencies and proposing updates to the data. However, much of this work has focused o...
متن کاملData Parallelism for Large-scale Distributed Computing
Large-scale computing systems are attractive for networked applications by providing scalable infrastructures. To launch distributed data-intensive computing applications in such infrastructures, communication cost, for example to transfer data files to compute nodes, can be a critical challenge due to point-topoint bandwidth scarcity. One way to improve communication performance is to employ p...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: IEEE Access
سال: 2020
ISSN: 2169-3536
DOI: 10.1109/access.2020.3027675